Nonlinear activations for convolutional neural network acoustic models
نویسنده
چکیده
Following their triumphs in visual recognition tasks, convolutional neural networks (CNNs) have recently been used to learn the emission probabilities of hidden Markov models in speech recognition. The key distinction of CNNs over deep neural networks (DNNs) is that they leverage translational invariance in the frequency domain, such that weights are shared and there are significantly fewer parameters to train. Since the acoustics of speech indeed display some translational invariance, CNNs could provide more powerful models than DNNs for various speech recognition tasks. Here, we compare the per-frame state classification accuracy of several popular nonlinear activation functions, both sigmoidal and rectified, for a small-scale CNN: the logistic function, the hyperbolic tangent function, the rectified linear unit, the leaky rectified linear unit, and the soft-plus function. We find that the leaky rectified linear unit and soft-plus function perform best by far, which suggests their potential in full-scale CNN acoustic models.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملToward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations
Neural network acoustic models have significantly advanced state of the art speech recognition over the past few years. However, they are usually computationally expensive due to the large number of matrix-vector multiplications and nonlinearity operations. Neural network models also require significant amounts of memory for inference because of the large model size. For these two reasons, it i...
متن کاملInvestigating the performance of machine learning-based methods in classroom reverberation time estimation using neural networks (Research Article)
Classrooms, as one of the most important educational environments, play a major role in the learning and academic progress of students. reverberation time, as one of the most important acoustic parameters inside rooms, has a significant effect on sound quality. The inefficiency of classical formulas such as Sabin, caused this article to examine the use of machine learning methods as an alternat...
متن کاملA Radon-based Convolutional Neural Network for Medical Image Retrieval
Image classification and retrieval systems have gained more attention because of easier access to high-tech medical imaging. However, the lack of availability of large-scaled balanced labelled data in medicine is still a challenge. Simplicity, practicality, efficiency, and effectiveness are the main targets in medical domain. To achieve these goals, Radon transformation, which is a well-known t...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کامل